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1. INTRODUCTION 

In modern society encryption is essential for information security. The scrambling of speech has 
gained widespread acceptance as an efficient method of enhancing protection in a variety of applications, 
including both civilian and military settings. Scrambling is performed by altering the speech signals to make 
them unintelligible to eavesdroppers [1]. Analog and digital speech scrambling are the two primary types of 
this form of encoding. Analog scramblers are the most common and reliable type of scramblers. These 
scramblers either use a permutation of speech segments in the frequency, time, or time-frequency domain or a 
permutation of transform coefficients for each speech segment. Analog scramblers are the most common and 
reliable type of scramblers. Analog scramblers are used extensively. 

The speech signal was modified by earlier speech scramblers through the use of specific matrices such 
as the Hadamard matrix, the Fibonacci transform, and the fast fourier transform (FFT) technique, chaotic 
mapping and pseudo random binary scrambling [2]-[4] and so on. After a number of years of development, 
quadrature amplitude modulation orthogonal frequency division multiplexing (QAM OFDM) was 
implemented in order to improve the performance of the bits error rate (BER) on the receiving side [5], [6]. 
Recently several scrambling methods for speech encryption was developes using techniques such chaotic maps 
and K-means clustering [7]-[11]. The main disadvantage shared by these methods is not providing enough 
security against cryptanalysis since in the permuted elements are not large enough to provide a sufficient 
number of variant permutations because of processing delays and hardware limitation [12]. 

To overcome the above problem this paper proposed a mixed transformation of multiwavelets and 
Arnold transforms taking the advantages of multi-spectrum characteristics of the multiwavelet and the shuffling 
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characteristics of Arnold. In last years multiwavelets joined the theory of wavelet. They are wavelets with 
vector values that guarantee the conditions that use matrices rather than scalars, as is the case with wavelets. It 
is possible to create multiwavelet bases that simultaneously possess various properties, such as symmetry, 
orthogonality, and a high number of vanishing moments. This is regarded as an advantage due to the fact that 
it makes it possible to create multiwavelet bases. In addition, the Arnold transform is a method that is typically 
utilized in the process of jumbling the image by rearranging the pixels in a haphazard manner [13]. When it 
comes to the scrambling of audio data, a two-dimensional Arnold transform is the best option because it 
effectively eliminates any correlation between the individual audio samples [14]. Arnold scrambling offers 
great scrambling degree. 


2. THE COMPREHENSIVE THEORETICAL BASIS 
2.1. Speech scrambling 

Due to the widespread applications of speech communication in areas such as the economy, the military, 
and trade, information espionage, which can include illegal wiretapping and surveillance, has emerged in recent 
years. Speech scrambling is based on the idea of modifying the signal at the transmitting end while the receiver 
descrambles to recover the signal. Thus, listeners of the signal in the transmission channel would only hear a noisy 
garbled version of the original audio signal [4]. 

Speech scramblers are classified to two categories as i) analog scramblers and ii) digital scramblers. In 
“analog” scramblers, the transmission of a signal is carried out digitally so it is the only real analog operation. The 
first step is to digitize the incoming signal, process it by an algorithm, convert it, and transmit it to the receiver. 
After that, the signal is digitized once more, inverted, and finally converted back into an analog signal so that it 
can be reconstructed [15]. While in digital encryption the input speech signal is digitized. Then, the digitized 
signal is compressed to a bit stream, then encrypted and transmitted through the communication channel. 
Basically, digital encryption is considered more secured than analog encryption, but it needs complex 
implementation [16]. On the other hand, the transmission in analog scrambling does not require speech 
compression or modem [17]. Speech security systems involves altering the original signal by using a specific 
coding algorithm in which the original signal is not similar to the coded one. The coding algorithm is based on a 
unique code also known as a “key”. Different keys lead to different coded signals. After that, the coded signal is 
transmitted through the communication channel. At the receiver, the coded signal is recovered by the decoding 
algorithm based on the specified key. The decoding key must be the inverse of the transmission key. At last, the 
result is the recovered signal, which it resembles the original signal [1]. The two common approaches ta analog 
scrambling are: i) frequency-domain and ii) time-domain methods. 


2.2. Arnold transform 

The Arnold transformation, also known as a cat face transformation, involves carefully relocating a 
given point. This transformation goes by both names. As an illustration, let's say that (x, y) is any point in a 
matrix with the dimensions p by q. Therefore, the equation that describes the change that takes place when the 
point (x, y) is exchanged for another point (x’, y’) is as (1). 


]= (22 GD) mm . 


The above transformation is known as two-dimensional (2D) Arnold transformation [8]. In Arnold 
transformation when iterate to a specific step, it will return to the original location, then it is cyclical. Using of 
the traditional Arnold transform for scrambling is unsafe. So, it is adjusted by adding two different positive 
integer parameters a and b, the transformation is as (2). 


é )=( lay ed L)) mod [al (2) 


It is difficult to get back to the original location after the transform because the transform coefficient is not the 
only way to improve the efficacy of security and the scrambling algorithm. Both of the parameters, a and b, 
can have different values, and the transform coefficient is not the only way to improve security efficiency [8]. 


2.3. Multiwavelet transform 

Multiwavelets is based on multi resolution analysis (MRA), similar to that of scalar wavelets. An 
MRA gives a framework for examining the functions at diverse scales. The standard multi resolution has one 
scaling function g(t) [18]. Wavelet is where multiwavelet is expanded from. The distinction lies in the fact that 
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multiwavelet possesses two or more scaling and wavelet functions, whereas wavelet only possesses one scaling 
function and one wavelet function [19]. 

It is possible to write down the set of scaling functions by making use of the vector notation, which is 
a notational convenience, 


p(t)=[ 91 () p2(t) ... g-(t)]* (3) 


where 9 (t) refers to what is known as the multi scaling function. In a similar manner, the multiwavelet function 
is defined with respect to the set of wavelet functions as (4). 


Wt) =[W1(t) Po(t) «. w(O]* (4) 


The @(t) is referred to as a scalar wavelet, or simply wavelet, when r = 1. Although r can theoretically be any 
size, the multiwavelets studied so far are primarily for r = 2. 

The GHM filter is a well-known multiwavelet; Geronimo, Hardian, and Massopust were the ones who 
initially proposed it. It is impossible for any scalar wavelet basis to achieve this combination of symmetry, 
orthogonality, and compact support, but it is offered by this basis [20]. The following two-scale equations are 
satisfied by the GHM two scaling and wavelet functions in accordance with (3) and (4): 


gp(t)} _ p(2t—-k) 
| = V2 Ae bax (5) 
po] — y(2t—k) 
al = V2 0 Gi lene (6) 


where Hk for the GHM system refers to the four scaling matrices HO, H1, H2, and H3 respectively. 
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The 2x2 matrix filters in our multiwavelet filter banks need a vector of input signal values. This is 
considered another problem when multiwavelets are employed in the transformation process, the scalar-valued 
input signal is transformed into a suitable vector-valued signal. This transformation is called preprocessing 
[21]-[24]. The wavelet and multiwavelets transformations are directly applicable to one dimensional signal 
only. However, speech is considered to be of two-dimensional signals, so there must be multiple of a ways to 
process them with a 1-D transform. There are primarily two types of approaches to this, namely separable 
algorithms and non-separable algorithmic approaches. These methods operate in a sequential fashion on each 
dimension. The standard procedure entails processing each of the rows in sequential order, followed by 
handling each column of the output in turn. Methods that cannot be separated into two distinct categories 
function in both dimensions of speech simultaneously [25]. Scalars are used in the computation of discrete 
multiwavelet transform. The following is an example of how wavelet transform matrices can be written: 


Go G1 Gz Gz 0 
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where H; and G; are the low and high pass filter impulse responses, which are 2x2 matrices [26]. 


3. METHOD 

One of the many different kinds of encryption schemes proposed to safeguard and protect the audio 
data is the system that is being proposed here. A hybrid speech scrambling system that makes use of two 
transforms, Arnold and multiwavelet, is described in this section. MATLAB is used to carry out the system's 
implementation. The process data flow diagram of the processing model as shown in Figure | describes the 
proposed system. 
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Figure 1. Process data flow diagram of the proposed scheme 


Initially, the sound signal is inserted and read in the form of audio samples. The first step is the 
preprocessing method which divides the samples into frames for each consists of 128 samples. So, the vector 
of audio samples is converted to a matrix of size (400*128). Then, the second step is to multiply each row of 
the matrix to multiwavelet matrix of size (400*256). The outcome of the previous method is a transformed 
matrix of size (400*128). The final step is to multiply the multiwavelet outcome by Arnold scrambling matrix 
to permute the frequency banks using (1). At last, the matrix is converted to a one-dimension vector for 
transition. The proposed steps are illustursted in Algorithm 1. 


4. RESULTS AND DISCUSSION 

During the conducted tests in this research several audio samples have been utilized as test materials 
in order to study the performance of the developed scheme taking into account the following audio parameters: 
i) audio sample resolution, ii) sampling rate and iii) recording time. Table | lists the attributes of these samples 
while Figures 2, 3, 4 and 5 present the waveform patterns of these samples. It is important to mention that 
audio sample of type stereo has two channels then only one of these channels will be tested. 

Table 2 shows the effect of the proposed method by comparing it with different discrete transforms: 
Symlet, Haar and Daubechies, it shows the calculated segmental peak signal to noise ratio distance measure 
and mean square error between original and recovered speech. While Table 3 shows the calculated segmental 
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peak signal to noise ratio distance measure between original and scrambled speech. Figures 6, 7, 8, and 9 show 
the waveforms of different test samples after scrambling by the proposed method. Finally, Figure 10 shows the 
estimated time taken to scramble and descramble for each audio sample. The Algorithm 1 iilustrates the 
procedural steps to construct the proposed scheme. 


Algorithm 1. The proposed algorithm 
1. Read source speech signal to construct a one dimensional array of samples (Wave-Samples 
(No. of samples) ) 


2. Calculate number of frames using the following equation 
n-Frames= No. of wave Samples/ f size ...... 3 
where, f_ size is the number of samples in each frame (128). 


3. Define Original-matrix of size (n-Frames*128) to split the signal into frames where each 
frame has an equal sample of 128 sample. 


temp =0 

for i=l : n Frames 
Original-matrix (i,:)= Wave-Samples(temp + 1:temp + f size); 
temp=temp + £ size; 

end 


4, Construct a matrix of size (400 * f size) which is assigned to original_frames (400*128), 
Since every 400 frames will be processed at a time. 

if (n_Frames>400) 

for i=l : 400 
original frames (i,:)= Original-matrix (i,:) 
end 
n_frames=400; 
end 


5. Apply Multiwavelet Transform on original_frames (i,:): 
preprocessing of the row 
for i=l : 128 
Preprocess row (1 , i+n )= original frames ( 1 
Preprocess_row (1, itn +1)= original frames ( 
End 


, i); 
Lag 2) fsgre(2)) 3 


Table 1. The attributes of the audio test samples 
File name No. of channel Sample rate Total samples Duration Bit persample _No. of frame 


Handel Mono 8192 73113 8.924 16 571 
FIRE Mono 11000 11950 1.086 16 93 
HALDOING Mono 11025 21944 1.990 16 171 
GIVBREAK Mono 11025 19151 1.737 16 149 
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Figure 2. Original waveform of FIRE sample Figure 3. Original waveform of GDVBREAK 
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Figure 4. Original waveform of HALDOING Figure 5. Original waveform of Handel sample 
sample 


Table 2. Effect of the proposed method on PSNR and MSE between original and recovered signal 
File name Db2 Sym2 Haar Proposed method 
MSE PSNR MSE PSNR MSE PSNR MSE PSNR 
GIVBREAK — 8.1238 e-27 656.2017 = 8.1238 e-27 656.2017. 1.5701 e-33. 810.7934 7.6462 e-35. 841.0145 
HALDOING 4.1433 e-26 639.9090 4.1433 e-26 = 639.9090 6.9641 e-33. 795.8971 4.3248 e-34 823.6869 
Handel 1.1524 e-25 629.679 1.1524 e-25 629.679 = 5.9539 e -33, 797.464 = 3.9935 e-34 = 824.4840 
FIRE 2.4706 e-25 622.0534 2.4706 e-25 622.0534 1.4485 e-33 788.5740 1.1786 e-33 813.6617 


Table 3. Effect of the proposed method on PSNR and MSE between original and scrambled signal 
File name Db2 Sym2 Haar Proposed method 
MSE PSNR MSE PSNR MSE PSNR MSE PSNR 
GIVBREAK 0.0174 95.9516 0.0174 95.9748 0.0174. 95.9905. 0.0559 84.2943 
HALDOING 0.0853 80.0691 0.0859 79.995 0.0857 80.0178 ~=—-0.2688 =~ 68.5893 


Handel 0.0769 81.0989 0.0772 81.0658 0.0766 81.1371 0.2021 71.4423 
FIRE 0.2102 71.0491 0.2090 71.1043 0.2121 ~=— 70.9590 0.5147 62.0933 
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Figure 6. Scrambled waveform of FIRE sample Figure 7. Scrambled waveform of GIVBREAK 
sample 
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Figure 8. Scrambled waveform of HALDOING Figure 9. Scrambled waveform of Handel sample 
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Figure 10. Estimated time needed for each sample in seconds 


It is noticeable in Figures 6, 7, 8, 9 that the waveform of the scrambled signal by the proposed scheme 
is completely modified and different from the waveform of the original signals shown in Figures 2, 3, 4, 5. 
Table 2 shows that the proposed system gives a very low level of MSE which indicates that the signal is 
descrambled with zero errors. Table 3 shows that the proposed method scrambled the signal in a very good 
order since the level of MSE error is very high compared to original signal which means that the transmitted 
scrambled signal is not understandable by the eavesdropper. Finally, Figure 10 shows the estimated time 
needed for each sample to be scrambled is very short of approximately 1 second for each sample. 
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5. CONCLUSION 

Using multiwavelet and Arnold gives a better PSNR values compared to conventional wavelet 
schemes. Moreover, the ET remains the same in all schemes. The performance of the proposed system was 
tested using three different measuring evaluations (MSE, PSNR, and ET), and the results showed that our 
proposed system has potential .It has been observed that the proposed method produces a scrambled signal that 
has no correlation with the original signal and that produces a signal that is very much scrambled. The 
waveform of the scrambled signal is irregular and highly distorted, which lowers the residual intelligibility of 
the scrambled speech. For future work optimization algorithims can be combined with several permutation 
algorithims such as particle swarm optimization (PSO), and gray wolf. Convert the scheme from frequency to 
time domain by using Arnold and PSO only without any frequency transformer. 
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